Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data

نویسندگان

  • Shomona Gracia Jacob
  • Rency S. Varghese
  • Zhen Zhang
  • Jianhua Xuan
  • C. Y. V. Watanabe
  • M. X. Ribeiro
  • C. Traina
چکیده

This paper highlights the significance of classification in data mining and knowledge discovery. In this paper we investigate the performance of various data mining classification algorithms viz. Rnd Tree, Quinlan decision tree algorithm (C4.5), KNearest Neighbor algorithm etc., on a large dataset from the „Wisconsin Breast tissue dataset‟ (derived from the UCI Machine Learning Repository) that comprises of 11 attributes and 106 instances. The results of this study indicate the level of accuracy and other performance measures of the algorithms in detecting the presence of breast cancer and the associated breast tissue conditions that increase the risk of developing cancer in future. Moreover the importance of feature selection/reduction in improving the performance of classification algorithms is also described. The classification algorithm Rnd Tree produced 100 percent accuracy for classification of all the training data under multiple classes. The classification algorithm was also applied to verify it‟s correctness in classifying test data. General Terms Data Mining, Breast tissue research

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms

In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...

متن کامل

Identification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms

In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011